Automatic Identification of Lifestyle and Environmental Factors from Social History in Clinical Text
نویسندگان
چکیده
Lifestyle and environmental factors play a significant role both in clinical research as well as clinical care. In clinical research, it has been established that 5-10% of cancers can be attributed to hereditary factors, while 90-95% have been found correlated with lifestyle and environmental factors such as smoking, diet and exercise. For clinical care, it has long been practice to record social history during clinical care as this history impacts not only diagnosis but also treatment options. We therefore propose in this work to automatically identify those lifestyle and environmental factors that clinical caregivers have documented. We extended Milton et. al.’s analysis of social and behavior information and Uzuner et. al.’s information on smoking in discharge summaries. Dataset We created a corpus from MTSamples website (http://www.mtsamples.com/). The website provides a large collection of publicly available transcribed medical records. We scraped 516 history and physical notes since these reports contain very rich social history information. We applied our in-house statistical section chunker (http://depts.washington.edu/bionlp/index.html?software) and identified 342 sections tagged as social history in 516 reports for annotation. Annotation Process We created a detailed annotation guideline to annotate the following lifestyle and environment factors: (1) substance abuse (smoking, alcohol and drug use), (2) occupation, (3) marital status, (4) family information, (5) residence, (6) living situation, (7) environmental exposures, (8) physical activity, (9) weight management, (10) sexual history, and (11) infectious disease history. We then defined 9 different dimensions that might apply to each type of factor; i.a., for substance abuse (1), annotations are made regarding status (possible values: past, current, none, unknown), time frame (e.g. since 2010), method (e.g. drink, inhale, inject), type (e.g. cigarettes, wine, cocaine), amount (e.g. # of cigrettes|drinks), frequency (e.g. daily, socially, rarely), and history (e.g. after 10 years of smoking), while for occupation (2), location and extent (e.g. part-time, night-shift) dimensions are annotated. Using the BRAT rapid annotation tool, two annotators each annotated 20 social history sections. In the first round, inter-rater agreement was 0.59 F1 for the 11 lifestyle and environmental factors and their 9 dimensions. The annotators met and resolved all the conflicts, and the annotation guideline was updated. A single annotator is in the process of annotating the rest of the dataset. Annotation of 120 social history sections has been completed. Conclusion The social history section in clinical text indeed contains a wealth of information regarding a patient's lifestyle and environmental factors, which can be used in both clinical care and in clinical research. We are in the process of building automated extractors based on the annotated set. We will release both the annotated corpus and the extractors to the research community. Our research goal is to apply these extractors to EMRs to facilitate robust correlation studies between these factors and disease outcomes. Acknowledgements This work was supported by University of Washington Institute of Translational Health Sciences UL1TR000423.
منابع مشابه
Identification and Prioritization of Effective Factors on Social Health of the Elderly Using Dematel & ISM Methods
Given the growing number of elderly people and the fact that elderly people are more vulnerable to social harm on the eve of aging, such as loneliness, depression, etc., therefore, it is important to promote their social health indicators as factors influencing social development. The factors affecting social health seem to be the product of various factors interacting. The purpose of this stud...
متن کاملIdentifying the Islamic Lifestyle Factors in Business from Perspective of Some Verses and Hadiths
Although various indicators of Islamic lifestyle have been explicated in various verses and hadiths, the lack of scrutinizing and fathoming of Islamic Intellectuals in this regard and also the lack of compatibility with today’s lifestyle has made the factors of Islamic lifestyle to be faded away in the contemporary society. Reflecting and contemplating on the Islamic verses and hadiths (a relig...
متن کاملUsing Machine Learning Algorithms for Automatic Cyber Bullying Detection in Arabic Social Media
Social media allows people interact to express their thoughts or feelings about different subjects. However, some of users may write offensive twits to other via social media which known as cyber bullying. Successful prevention depends on automatically detecting malicious messages. Automatic detection of bullying in the text of social media by analyzing the text "twits" via one of the machine l...
متن کاملIdentifying the factors influencing the re-admission of hospitalized patients in the internal wards of educational hospitals: a qualitative study
Background: Currently many hospitals around the country face increasing demands of their patients and readmission.The rate of readmission is a useful indicator for determining the performance of healthcare system and it shows the quality of services in the medical institutions. Readmissions have high economic, social and financial impact and studying the related factors seems to be high priorit...
متن کاملAuthor gender identification from text using Bayesian Random Forest
Nowadays high usage of users from virtual environments and their connection via social networks like Facebook, Instagram, and Twitter shows the necessity of finding out shared subjects in this environment more than before. There are several applications that benefit from reliable methods for inferring age and gender of users in social media. Such applications exist across a wide area of fields,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016